CharacTer: Translation Edit Rate on Character Level
نویسندگان
چکیده
Recently, the capability of character-level evaluation measures for machine translation output has been confirmed by several metrics. This work proposes translation edit rate on character level (CharacTER), which calculates the character level edit distance while performing the shift edit on word level. The novel metric shows high system-level correlation with human rankings, especially for morphologically rich languages. It outperforms the strong CHRF by up to 7% correlation on different metric tasks. In addition, we apply the hypothesis sentence length for normalizing the edit distance in CharacTER, which also provides significant improvements compared to using the reference sentence length.
منابع مشابه
A new model for persian multi-part words edition based on statistical machine translation
Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some s...
متن کاملThe E ects of Word Order and Segmentation on TranslationRetrieval
This research looks at the eeects of word order and segmentation on translation retrieval performance for an experimental Japanese-English translation memory system. We implement a number of both bag-of-words and word order-sensitive similarity metrics, and test each over character-based and word-based indexing. The translation retrieval performance of each system connguration is evaluated empi...
متن کاملOCR Error Correction Using Statistical Machine Translation
In this paper, we explore the use of a statistical machine translation system for optical character recognition (OCR) error correction. We investigate the use of word and character-level models to support a translation from OCR system output to correct french text. Our experiments show that character and word based machine translation correction make significant improvements to the quality of t...
متن کاملScene Character Detection and Recognition with Cooperative Multiple-Hypothesis Framework
To handle the variety of scene characters, we propose a cooperative multiple-hypothesis framework which consists of an image operator set module, an Optical Character Recognition (OCR) module and an integration module. Multiple image operators activated by multiple parameters probe suspected character regions. The OCR module is then applied to each suspected region and returns multiple candidat...
متن کاملA Comparison of Four Character-Level String-to-String Translation Models for (OCR) Spelling Error Correction
We consider the isolated spelling error correction problem as a specific subproblem of the more general string-to-string translation problem. In this context, we investigate four general string-to-string transformationmodels that have been suggested in recent years and apply them within the spelling error correction paradigm. In particular, we investigate how a simple ‘k-best decoding plus dict...
متن کامل